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METHOD OF IDENTIFYING ENDMEMBER SPECTRA VALUES FROM 
~~ HYPERSPECTRAL IMAGE DATA 

The present invention relates to a method of identifying 
5 endmember spectral values from multispectral or 

hyperspectral image data, which is particularly useful in 
identifying different materials from multispectral or 
hyperspectral images. 

10 It is known to collect remote sensing data to provide 

images of scenes to aid in broad scale discrimination of 
various features of land scanned including identifying 
mineral deposits and vegetation. Two examples of 
hyperspectral scanners are NASA's 224 band AVTRIS, which 
15 has bands spaced about every 10 nanometers in a range from 
400 to 2500 nanometers and the 128 band Australian 
commercial scanner, HyMap, which covers a similar 
wavelength range with about 16 nanometer resolution. 

20 A goal is therefore to identify the components of each 

pixel in the hyperspectral image. This can be done using 
comparison with a library of spectra of "pure" materials . 
«Pure» materials in a hyperspectral image are often termed 
endmembers . 

Depending on the resolution of the image obtained from the 
spectral scanner, an individual pixel may range in size 
from 5 to 10 meters in images from an aircraft scan or 10 
to 30 meters from a satellite scan. Each pixel therefore 
30 will relate to a portion of a scene which will usually 
include a mixture of material components. It is not 
uncommon to find that not all of the pure spectral 
representations of endmembers are present in a scene. 

35 images are also subject to distortion due to noise from 
various sources including instruments, atmospheric 
interference, viewing geometry and topography of the area 
scanned. Corrections for these distortions are still not 
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sufficiently accurate to allow for reliable comparisons to 
reference libraries. Also, many remotely sensed scenes 
contain materials not in libraries. Therefore, there are 
problems with matching spectra with ground-based 
libraries. There is consequently interest in identifying 
the component materials represented in a scanned scene, 
without reference to a library. 

Similar problems occur in other fields where it is desired 
to determine endmembers from multispectral, hyperspectral 
or other data where a signal is detected on a number of 
channels or bands. For example a similar problem occurs 
in the analysis of proteomics and genomics array data 
where the signal represents cell or organism response 
across a range of proteins, cDNAs or oligonucleotides. In 
this context, each protein, cDNA or oligonucleotide is 
regarded as being equivalent to a wavelength or band in 
the hyperspectral or multispectral context. Similar 
problems also occur in fluorescence imaging such as 
fluorescence microscopy. 

In the art the terms multispectral and hyperspectral, 
multidimensional and hyperdimensional etc. are used, with 
"hyper" generally meaning more than -multi". This 
distinction is not relevant for the purposes of this 
invention. For convenience, throughout the rest of the 
specification the term 'multispectral " will be used to 
refer to both multispectral and hyperspectral data. The 
term 'multidimensional" and other -multi" terms will 
likewise be used to mean more than one dimension. 

Current solutions of finding endmembers often involve 
-whitening" or "sphering" the data and then fitting to the 
data a multidimensional simplex having a number of 
vertices equal to the number of endmembers. 

The bands of a multispectral image are usually highly 
correlated. -Whitening" involves transforming the data to 
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be uncorrected with a constant variance and preferably an 
approximately Normal distribution of errors. It is also 
desirable to compress the dimensionality of the data to 
reduce calculation time. 

A widely used algorithm to "whiten" the data is to 
compress the information into a smaller number of bands by 
use of the Minimum Noise Fraction (MNF) transform. Thxs xs 
disclosed in Green, A., Berman, M. , Switzer, P., and 
Craig, M. (1988) . A transformation for ordering 
multispectral data in terms of image quality with 
implications for noise removal. IEEE Transactions on 
Geoscience and Remote Sensing, 26:65-74. 

15 Simplex fitting using the pixel purity index (PPI) method 
is disclosed in Boardman, J. Kruse, F., and Green, R. 
(1995) Mapping target signatures via partial unmixing of 
AVIRIS data, in Green, R. (editor), Summaries of the Fifth 
Annual CTPL Airborne Earth Science Workshop, volume 1, 
AVIRIS Workshop, pp 23-26. JPL Publ . 95-1, NASA, Pasadena, 
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CA. 



One of the main disadvantages of Boardman' s method is that 
it requires considerable manual intervention in 
25 processing. 

An alternative to Boardman' s method is the N-FINDR 
algorithm by Winter, M. (1999). Fast autonomous spectral 
endmember determination in hyperspectral data. In 
30 Proceedings of the 13* International Committee on Applied 
Geologic Remote Sensing, Vancouver, vol. 2, pp 337 334. 
This process is fully automated. After transformation to 
(M-l) dimensional subspace, this algorithm finds the M- 
dimensional simplex of maximum volume constrained to lxe 
35 within the data cloud. Another alternative is to 

construct the minimum volume simplex enclosing the data 
cloud, which is provided by Craig, M. (1994) . Minimum- 
volume transforms for remotely sensed data. IEEE 
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Transactions on Geoscience and Remote Sensing, 32:542-552. 

These solutions cannot satisfactorily deal with the common 
situation where pure or almost pure endmembers are absent 
5 from the scene. Furthermore, they do not deal well with 
noise in the data. 

According to the present invention there is provided a 
method of identifying endmember spectra values from 
10 multispectral image data, where each multispectral data 
value is equal to a sum of mixing proportions of each 
endmember spectrum, said method including the steps of: 

processing the data to obtain a multidimensional 
simplex having a number of vertices equal to the number of 
15 endmembers, the position of each vertex representing a 
spectrum of one of the endmembers, 

wherein processing the data includes: 
providing starting estimates of each endmember 
spectrum for each image data value; 
20 estimating the mixing proportions for each data value 

from estimates of the spectra of all the endmembers; 

estimating the spectrum of each endmember from 
estimates of the mixing proportions of the spectra of all 
the endmembers for each image data value; 
25 repeating estimation steps until a relative change in 

the regularised residual sum of squares is sufficiently 
small, the regularised residual sum of squares including a 
term which is a measure of the size of the simplex. 

30 Preferably the term used in the regularised residual sum 
of squares is the sum of the squared distances between all 
of the simplex vertices. 

Preferably the step of providing the starting estimates 
35 includes choosing starting points with a high pixel purity 
index score. More preferably the starting estimates are 
well separated. 
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Preferably the relative change in the regularised residual 
sum of the squares is sufficiently small when the ratio of 
successive values of regularised residual sum of squares 
is less than a tolerance. Preferably the tolerance is 
0.99999. 

Preferably processing the data includes whitening the 
data. Preferably whitening the data includes conducting a 
transform of the data into data that is not band 
correlated. Preferably processing the data includes 
removing bands that do not have a high signal to noise 
ratio . 



Preferably the step of estimating the spectrum of each 
15 endmember is conducted using a linear estimation 
technique . 

Preferably the step of estimating the mixing proportions 

is conducted using a quadratic piOyj-cuumj.*^ e 

20 technique. 

In order to provide a better understanding a preferred 
embodiment of the present invention will now be described 
in detail, by way of example only, in relation to Figure 1 
25 which is a diagrammatic representation of a simple example 
of the use of the method of the present invention. 

Multispectral image data is obtained from a multispectral 
scanner, such as the AVIRIS airborne scanner. A typical 

30 scan for mineral applications includes a short wave infra 
red scan with wavelengths in the region of 2,000 to 2,500 
nanometers. This spectral range is useful for exhibiting 
distinctive shapes for important clay minerals. Typically 
this will provide 10 's of thousands to millions of pixels 

35 or even many more. 

A MNF transform is performed on the relevant bands of data 
to produce variables which are uncorrelated and 
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approximately Normally distributed with an estimated error 
variance of 1. It is usual to retain MNF bands with the 
highest signal to noise ratios. 

we let d be the number of MNF bands retained, N is the 
number of pixels, and M is the number of endmembers 
(assumed to be less than or equal to d+1) . 

It is convenient to think of the MNF data as an N x d 
matrix, whose ith row is written as Xi, and whose jth 
column is written as xj. Similarly, it will also be 
convenient to think of the unknown endmembers as an M x d 
matrix, whose kth row is written as E* and whose jth column 
is written as ej. 

The MNF transformed data can be represented by the 
following formula: 



M 

Xi = X p ik E k + e±, i = 1,...,W. • (D 



Here ft is an error vector, and the p ik are mixing 
proportions that satisfy the constraints of: 



M 



p ik > 0,k = 1,...,M, £ Pik =1, i =1,~,N. (2) 



k=l 



If the error term is ignored then (1) and (2) tell us that 
the data lies inside a simplex in (M-l) dimensional space, 
and the MNF representations of the M endmembers are at the 
vertices of the simplex. 

A least squares minimisation of equation (1) is conducted 
subject to the constraints (2) and a term that constrains 
the size of the simplex, while being faithful to the 
model. It can be shown that without the constraints the 
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solution converges to a simplex that is too large. This 
problem may be solved iteratively: given estimates of the 
endmember spectra, the proportions for each pixel are 
estimated, which is a quadratic programming problem; and 
5 given estimates of the proportions, the endmember MNF 
spectra are estimated, which is a linear estimation 
problem. 

The constraint is the addition of a term to the residual 
10 sum of squares which is a measure of the size of the 

simplex. A convenient term to add is the sum of squared 
distances between all of the simplex vertices . It can 
easily be shown that this is proportional to the sum of 
the variances of the simplex vertices over the d 
15 dimensions which is a quadratic function of the vertices 
and therefore computationally convenient. The regularised 
solution minimises : 



20 r = £ { {Xj — V*j) T (xj - poj) +Wbej>, < 3 > 

i=i 

where X is small, and where D=i M - 11 T /M. 

25 R is the regularised residual sum of squares; P is a N x M 
matrix of proportions of M endmembers for all N pixels; l„ 
is the M x M identity matrix and 1 is the vector of length 
M, where every entry equals 1. 

30 Formula (3) is minimised iteratively. 

in what follows, Pi will denote the estimated value of P 
after the 1th iteration, and either • J ,ij**l.-.d or E*,i. 

fe.! ar will denote the estimated endmembers after the 1th 

35 iteration. 

1. Let 6^0,3=1 d denote the starting values for the 

algorithm and let 2=1. 
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2. Let Pi denote the value of P minimising 
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Ri.i=L{ (xj-Pe^i-O'txj-Pe^i-i) +Xe-,,2-/ De^i-i). (4) 

subject to (2) . This is done using a quadratic 
programming algorithm. 

There are two things to note here. First, the second 
term in (4) is independent of P, and so only the 
first term needs to be minimised in this step. 
Second, we can separate the minimisation into N 
separate quadratic programming minimisations for the 
data at each of the N pixels. Specifically, for 
1=1,..., N, we find Pi*,.k=l M, which minimise 



M M 

(Xi - X Pi*£*,i-i)' r '(Xi - X Pi k E k ,i-i) w) 



subject to (2) . 

Let Rx.x^n denote the minimum value of Ri.i achieved. 
3. Let ej,i, j=l,~.,d denote the value of ej minimising 

d 

R lt2 = S {(x J -Piej) T (x i -Pie J ) + Xe/Dej}, (6) 

i=i 

This minimisation can be separated into d separate 
minimisations, and straightforward matrix algebra can 
be used to show that 

ej.i = (P/Pi+to) "*Pi T *j, J =1 d - (7) 

Let Ri,2.«a n denote the minimum value of R1.2 achieved. 
It can also be shown that 



d 
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Ri.2.«m = £ x/{I N -Pi(Pi T Pi+^»" 1 »i T >^. 

where I w is the N x N identity matrix. 
4. Let 

ri=Rl , 2 , min/ min 
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(9) 



Because each step in the iteration reduces R, we must 

have Si,a.«ta< ° r r ^ X - V^** 1 ri gets very clOSe 

to 1, the algorithm stops. More specifically: 
If n < tol, let 1=2+1 and go to 2 . Otherwise, stop 
and let ej,i, j=l,~,d, or equivalently B*,i# k=l,...,M, 
be the final estimates of the endmembers, and let the 
ith row of Pi give the final estimates of the 
proportions of each of the estimated endmembers 
present in the ith pixel. 

The process is terminated when a ratio of successive 
values of the regularised residual sum of squares is less 
than a tolerance (tol) . The default tolerance value is 
0.99999. Using this value in typical examples, 20 to 100 
iterations are required until the process stops. 

Using this method the projections of all the data onto 
this hyperplane need not lie inside the simplex. 

Figure 1 shows a simulated toy example with a true simplex 
as a solid line, an (unregularised) least squares solution 
as a dotted line and a regularised least squares solution 
as a broken line. The regularised least squares solution 
provides much better estimates of the true endmembers . 

Most of the information about the simplex is contained in 
data on or near boundaries of the data cloud. So if only 
data nearest the convex hull of the data cloud is used 
computation becomes quicker. In high dimensional problems, 
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points at or near the ends of random one dimensional 
projections of the data can be used. Alternatively points 
only on or near two dimensional convex hulls of all 
d(d-l)/2 MNF band pairs are used. 

The starting points for the iterative process can 
determine the outcome of the iterative process. Good 
starting points should be well separated in MNF space. 
Points with high PPI scores can be useful. The PPI scores 
are the number of times the data at each pixel are at or 
near one of the ends of these projections. 



Apart from the estimates of the endmembers an intermediate 
product of the algorithm is the endmember proportions in 
15 each pixel. The proportions give a meaningful idea of how 
much each endmember is represented in each pixel (assuming 
that mixing is proportional to area) . This can be 
represented as images /maps. A particularly useful 
diagnostic is the maximum proportion of each estimated 
20 endmember in the scene. The lower the maximum proportions 
for each endmember spectrum the further the estimated 
endmember is from the data cloud and the confidence in the 
estimate will be correspondingly smaller. For endmember 
estimates having a maximum proportion less than 0 . 5 it 
25 becomes difficult to estimate the true endmember spectrum. 

Another useful by-product of the algorithm is an image 
showing the contribution of each pixel to the regularised 
residual sum of squares. If there are some large residuals 

30 and especially if they are spatially clustered it is an 
indication that the model is not fitting the data 
adequately, either because the chosen value of M is too 
small or if only the data on or near the boundaries of the 
data cloud is used then important observations may have 

35 been omitted from the algorithm. Additional observations 
can be added to the data used and the algorithm re-run to 
see whether fitting can be improved. 
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